Techniques and devices for automatic speech recognition

نویسنده

  • Kjell Elenius
چکیده

For some decades the possibility of automatic speech recognition has intrigued many speech researchers as well as the general public. This dissertation, which contains ten different speech studies and a summary, is mostly about "traditional speech recognition" which means the conversion of the acoustic speech signal to phonetic features, phonemes and words. This is opposed to 'kpeech understanding, "in which the system is required to make adequate responses to the 'heaning " of the input speech and that demands linguistic components such as syntax, semantics and pragma tics. The parameterisation of the speech signal is handled by the acoustic front end of a recognition system. The standard technique has been to convert the speech sipal to spectral information, for instance, filfer bank amplitudes. Frequently, cepstral coefficients are derived instead of amplitude parameters. These traditional techniques for speech parameterisation are basically static descr~~tions. In one study it is shown that the addition of spectral change parameters (differentia fed cepstrum coefficients) to regular cepstrum coefficients improves recognition performance. Another study attempted the use of parameters inspired by models of human auditory processing. However, there was a negative correlation befween the complexiq of the models used and the recognition performance, although some of the more simplistic models performed well. The possibility of using synthetic speech to build templates for a word recognition system has also been examined with some encouraging results. This technique has some attractive qualities, since it could dramatically reduce the need for training of a recogniser. Some papers are oriented more towards the applications of word recognition. One is a feasibility study of voice controlled dialling in an intercom system and one is about the construction of a portable hardware recogniser. Another deals with the reduction of the effeck of background noise, especially when using speech recognition for voice controlled dialling in a cars. The method used is shown to improve the noise immunify of the system. There are furthermore some papers in which artificial neural networks are used for phoneme and word recognition. It is shown that including coarticulation effects befween phonemes is a crucial factor and that in particular recurrent neural networks seem to be useful for this. Some experiments have been carried out to evaluate whether articulatorily based phonetic features are a better basis than phonemes for speaker independent word recognition. However, it seems that this is not the case, alihough the results are not fully conclusive. The last two studies deal with some linguistic aspects of speech technology. The first is a comparison of the ')honetic" word space of five European languages, especially in the context of speech recognition with broad, phonetic classes, and the results indicate that the lexical space is very unevenly exploited. The second compares word class assipment by explicit symbolic rules to an artificial neural network trained for the same task. The network-based method is shown toperform somewhat better.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

Survey of the Speech Recognition Techniques for Mobile Devices

This paper presents an overview of different approaches for providing automatic speech recognition (ASR) technology to mobile users. Three principal system architectures in terms of employing wireless communication link are analyzed: Embedded Speech Recognition Systems, Network Speech Recognition (NSR) and Distributed Speech Recognition (DSR). Overview of the solutions which became standards by...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007